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We consider semiparametric location-scatter models for which 
the p-variate observation is obtained as X = AZ + /i, where /i is a 
p-vector, A is a full-rank p x p matrix and the (unobserved) random 
p-vector Z has marginals that are centered and mutually indepen- 
dent but are otherwise unspecified. As in blind source separation 
and independent component analysis (ICA), the parameter of inter- 
est throughout the paper is A. On the basis of n i.i.d. copies of X, we 
develop, under a symmetry assumption on Z, signed-rank one-sample 
testing and estimation procedures for A. We exploit the uniform local 
and asymptotic normality (ULAN) of the model to define signed-rank 
procedures that are semiparametrically efficient under correctly spec- 
ified densities. Yet, as is usual in rank-based inference, the proposed 
procedures remain valid (correct asymptotic size under the null, for 
hypothesis testing, and root-n consistency, for point estimation) un- 
der a very broad range of densities. We derive the asymptotic prop- 
erties of the proposed procedures and investigate their finite-sample 
behavior through simulations. 

1. Introduction. In multivariate statistics, concepts of location and scat- 
ter are usually defined through afhne transformations of a noise vector. To 
be more specific, assume that the observation X is obtained through 

(1.1) X = AZ + fi, 

where \i is a p-vector, A is a full-rank px p matrix and Z is some standard- 
ized random vector. The exact nature of the resulting location parameter 
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fi, mixing matrix parameter A, and scatter parameter S = AA' crucially 
depends on the standardization adopted. 

The most classical assumption on Z specifies that Z is standard p-normal. 
Then \x and £ simply coincide with the mean vector ~E[X] and variance- 
covariance matrix VarpT] of X, respectively. In robust statistics, it is often 
rather assumed that Z is spherically symmetric about the origin of MP — in 
the sense that the distribution of OZ does not depend on the orthogonal pxp 
matrix O. The resulting model in (1.1) is then called the elliptical model. 
If Z has finite second-order moments, then [i = E[X] and £ = cVarpT] for 
some c > 0, but (1.1) allows to define \i and E in the absence of any moment 
assumption. 

This paper focuses on an alternative standardization of Z, for which Z 
has mutually independent marginals with common median zero. The result- 
ing model in (1.1) — the independent component (IC) model, say — is more 
flexible than the elliptical model, even if one restricts, as we will do, to 
vectors Z with symmetrically distributed marginals. The IC model indeed 
allows for heterogeneous marginal distributions for X, whereas, in contrast, 
marginals in the elliptical model all share — up to location and scale — the 
same distribution, hence also the same tail weight. This severely affects the 
relevance of elliptical models for practical applications, particularly so for 
moderate to large dimensions, since it is then very unlikely that all variables 
share, for example, the same tail weight. 

The IC model provides the most standard setup for independent com- 
ponent analysis (ICA), in which the mixing matrix A is to be estimated 
on the basis of n independent copies X±, .. . ,X n of X, the objective being 
to recover (up to a translation) the original unobservable independent sig- 
nals Z\, . . . , Z n by premultiplying the X^s with the resulting A -1 . It is well 
known in ICA, however, that A is severely unidentified: for any pxp per- 
mutation matrix P and any full-rank diagonal matrix D, one can always 
write 

(1.2) X=[APD][(PDy 1 Z]+n = AZ + n, 

where Z still has independent marginals with median zero. Provided that 
Z has at most one Gaussian marginal, two matrices Ai and A2 may lead 
to the same distribution for X in (1.1) if and only if they are equivalent 
(we will write Ai ~ A2) in the sense that A2 = A\PD for some matrices P 
and D as in (1.2); see, for example, [25]. In other words, under the assump- 
tion that Z has at most one Gaussian marginal, permutations (P), sign 
changes and scale transformations (D) of the independent components are 
the only sources of unidentifiability for A. 

This paper considers inference on the mixing matrix A. More precisely, 
because of the identifiability issues above, we rather consider a normalized 
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version L of A, where L is a well-defined representative of the class of mixing 
matrices that are equivalent to A. This parameter L is actually the param- 
eter of interest in ICA: an estimate of L will indeed allow one to recover the 
independent signals Z±, . . . ,Z n equally well as an estimate of any other A 
with A ~ L. Interestingly, the situation is extremely similar when consider- 
ing inference on £ in the elliptical model. There, £ is only identified up to 
a positive scalar factor, and it is often enough to focus on inference about 
the well-defined shape parameter V = S/(detE) 1 / p (e.g., in PCA, principal 
directions, proportions of explained variance, etc. can be computed from 
V). Just as L is a normalized version of A in the IC model, V is a nor- 
malized version of T, in the elliptical model, and in both classes of models, 
the normalized parameters actually are the natural parameters of interest in 
many inference problems. The similarities further extend to the semipara- 
metric nature of both models: just as the density g\\.\\ of \\Z\\ in the elliptical 
model, the pdf g r of the various independent components Z r , r = 1, . . . ,p, 
in the IC model, can hardly be assumed to be known in practice. 

These strong similarities motivate the approach we adopt in this paper: we 
plan to conduct inference on L (hypothesis testing and point estimation) in 
the IC model by adopting the methodology that proved extremely successful 
in [7, 8] for inference on V in the elliptical model. This methodology com- 
bines semiparametrically efficient inference and invariance arguments. In the 
IC model, the fixed- (/i, A) nonparametric submodels (indexed by g±, . . . ,g p ) 
indeed enjoy a strong invariance structure that is parallel to the one of the 
corresponding elliptical submodels (indexed by As in [7, 8], we exploit 
this invariance structure through a general result from [11] that allows one to 
derive invariant versions of efficient central sequences, on the basis of which 
one can define semiparametrically efficient (at fixed target densities g r = f r , 
r = 1, . . . ,p) invariant procedures. As the maximal invariant associated with 
the invariance structure considered turns out to be the vector of marginal 
signed ranks of the residuals, the proposed procedures are of a signed-rank 
nature and do not require to estimate densities. While they achieve semi- 
parametric efficiency under correctly specified densities, they remain valid 
(correct asymptotic size under the null, for hypothesis testing, and root-n 
consistency, for point estimation) under misspecified densities. 

We will consider the problem of estimating L and that of testing the null 
Hq'.L = Lq against the alternative H\ :L ^ Lq, for some fixed Lq. While 
point estimation is undoubtedly of primary importance for applications (e.g., 
in blind source separation), one might question the practical relevance of 
the testing problem considered, especially when Lq is not the p-dimensional 
identity matrix. Solving this generic testing problem, however, is the main 
step in developing tests for any linear hypothesis on L, and we will explicitly 
describe the resulting tests in the sequel. An extensive study of these tests 
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is beyond the scope of the present paper, though; we refer to [19] for an ex- 
tension of our tests to the particular case of testing the (linear) hypothesis 
that L is block-diagonal, a problem that is obviously important in practice 
(nonrejection of the null would indeed allow practitioners to proceed with 
two separate, lower-dimensional, analyses). Testing linear hypotheses on L 
includes many other testing problems of high practical relevance, such as 
testing that a given column of L is equal to some fixed p- vector, and testing 
that a given entry of L is zero — the practical importance of these two test- 
ing problems, in relation, for example, with functional magnetic resonance 
imaging (fMRI), is discussed in [22]. 

The paper is organized as follows. In Section 2, we fix the notation and de- 
scribe the model (Section 2.1), state the corresponding uniformly locally and 
asymptotically normal ( ULAN) property that allows us to determine semi- 
parametric efficiency bounds (Section 2.2) and then introduce, in relation 
with invariance arguments, rank-based efficient central sequences (Section 
2.3). In Sections 3 and 4, we develop the resulting rank tests and estimators 
for the mixing matrix L, respectively. Our estimators actually require the 
delicate estimation of 2p(p— 1) "cross-information coefficients," an issue we 
solve in Section 4.2 by generalizing the method recently developed in [5]. 
In Section 5, simulations are conducted both to compare the proposed esti- 
mators with some competitors and to investigate the validity of asymptotic 
results — simulation results for hypothesis testing are provided in the sup- 
plementary article [16]. Finally, the Appendix states some technical results 
(Appendix A) and reports proofs (Appendix B). 

2. The model, the ULAN property and invariance arguments. 

2.1. The model. As we already explained, the IC model above suffers 
from severe identifiability issues for A. To solve this, we map each A onto a 
unique representative L = II(A) of the collection of mixing matrices A that 
satisfy A ~ A (the equivalence class of A for ~). We propose the mapping 

A ^ 11(A) = AD+PD 2 , 

where Df is the positive definite diagonal matrix that makes each column 
of ADf have Euclidean norm one, P is the permutation matrix for which 
the matrix B = (6j,-) = ADfP satisfies \ba\ > \bij\ for all i < j and D 2 is the 
diagonal matrix such that all diagonal entries of II(A) = ADf PD 2 are equal 
to one. 

If one restricts to the collection A4 P of mixing matrices A for which no 
ties occur in the permutation step above, it can easily be shown that, for 
any Ai, A 2 G M p , we have that Ai ~ A 2 iff II(Ai) = II(A 2 ), so that this 
mechanism succeeds in identifying a unique representative in each class of 
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equivalence (this is ensured with the double scaling scheme above, which 
may seem a bit complicated at first). Besides, II is then a continuously 
differentiable mapping from M p onto M\ p := H(M P ). While ties may always 
be taken care of in some way (e.g., by basing the ordering on subsequent 
rows of the matrix B), they may prevent the mapping II to be continuous, 
which would cause severe problems and would prevent us from using the 
Delta method in the sequel. It is clear, however, that the restriction to Ai p 
only gets rid of a few particular mixing matrices, and will not have any 
implications in practice. 

The parametrization of the IC model we consider is then associated with 

(2.1) X = LZ + n, 

where fi G W, L G Mi p and Z has independent marginals with common 
median zero. Throughout, we further assume that Z admits a density with 
respect to the Lebesgue measure on MP, and that it has p symmetrically 
distributed marginals, among which at most one is Gaussian (as explained 
in the Introduction, this limitation on the number of Gaussian components is 
needed for L to be identifiable) . We will denote by J- the resulting collection 
of densities for Z . Of course, any g G T naturally factorizes into g{z) = 
llr=i 9r(z r ), where g r is the symmetric density of Z r . 

The hypothesis under which n mutually independent observations Aj, i = 
1, . . . , n, are obtained from (2.1), where Z has density g G J 7 , will be denoted 
as Pj™5, with # = (//, (vecd° L)')' G 6 = W x vecd°(Aii p ), or alternatively, 

as g'-i for any p x p matrix A, we write vecd ^ for the p{p — l)-vector 
obtained by removing the p diagonal entries of A from its usual vectorized 
form vec^4 (diagonal entries of L are all equal to one, hence should not be 
included in the parameter). 

The resulting semiparametric model is then 

(2.2) pW:=U^ n) :=U U 

Performing semiparametrically efficient inference on i9, at a fixed / G J-, 
typically requires that the corresponding parametric submodel satisfies 
the uniformly locally and asymptotically normal ( ULAN) property. 

2.2. The ULAN property. As always, the ULAN property requires tech- 
nical regularity conditions on /. In the present context, we need that each 
corresponding univariate pdf f r , r = 1, . . . ,p, is absolutely continuous (with 
derivative f' r , say) and satisfies 



/OO PCO 
y 2 fr(y) dy < oo, l fr := / <p 2 fr (y)f r (y) dy < oo 
-oo J —oo 
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and 



/oo 
y 2 <p%(y)fr(y)dy < oo, 
-oo 



where we let ipf r := — f' r / f r - In the sequel, we denote by .F u i an the collection 
of pdfs / G T meeting these conditions. 

For any / 6 -Fuiam let 7rs(/) := ^/r°/ s > define the optimal p-variate loca- 
tion score function tpf : MP — > MP through z = [z\ , . . . , z p )' i->- cpf (z) = (ipfa (z\) , 
. . . ,ipf (z p )Y , and denote by If the diagonal matrix with diagonal entries 
If r , r = 1, . . . ,p. Further write Ig for the ^-dimensional identity matrix and 
define 

p p-i 

C ■= ^2^2( e re r ® u s e' s+S[s ^ r] ), 

r=l s=l 

where ® is the usual Kronecker product, e r and u r stand for the rth vectors 
of the canonical basis of MP and respectively, and <5[ s > r ] is equal to 

one if s > r and to zero otherwise. The following ULAN result then easily 
follows from Proposition 2.1 in [19] by using a simple chain rule argument. 

Proposition 2.1. Fix f £ J~ u \sLn- Then the collection of probability dis- 

(n) 

tributions V ^ is ULAN, with central sequence 

( 



(2.3) A 



n 



\ 



t=i 



V 



n 



-1/2 



(7(1, ® L' 1 )' £ veciiffiZ^Z'i - I p 



i=l 



where Zi = Zi($) = L (Xi — fj,), and full-rank information matrix 



LJ;2 



where f-i '■= (L 1 )'IfL 1 and 



T LJ , 2 :=C(I P ®L- 1 )' 



r=l 



+ ^2 (^sr(f){e r e r ® e s e' s ) + (e r e' s ® e s e r )) 

More precisely, for any $ n = $ + 0(n" 1 / 2 ) {with = (//, (vecd° L)')') and 
any bounded sequence (r n ) in M p2 , we have that, under P^ 1 as n- 

log(dP^ +n _ 1/2TnJ /dP^ f ) = r' n ^ nJ - \r' n T L jr n + o P (l), 



oo, 
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and A# n j converges in distribution to a p 2 -variate normal distribution with 
mean zero and covariance matrix Tl /. 

Semiparametrically efficient (at /) inference procedures on L then may 
be based on the so-called efficient central sequence *. 2 resulting from 

A#,/ ; 2 by performing adequate tangent space projections; see [3]. Under P^"]-, 
A^ j. 2 is still asymptotically normal with mean zero, but now with covariance 
matrix /-2 (t ne efficient information matrix). This matrix f-2 settles 
the semiparametric efficiency bound at / when performing inference on L. 
For instance, an estimator L is semiparametrically efficient at / if 

(2.4) ^vecd°(L - L) 4jV p(p _ 1) (0 J (F^)" 1 ). 

The performance of semiparametrically efficient tests on L can similarly be 
characterized in terms of T* L ». 2 : a test of T~Lq :L = Lq is semiparametrically 
efficient at / (at asymptotic level a) if its asymptotic powers under local 
alternatives of the form 7~L^ : L = Lq + n~ l l 2 H, where H is an arbitrary 
p x p matrix with zero diagonal entries, are given by 

(2.5) 1 - ^( P -i)(x? (p -i),i- a ; (vecd° iT)n ,/;2(vecd° H)), 

where i- a stands for the a-upper quantile of the X p r p -u distribu- 

tion, and ^ p {p-i)(-', S) denotes the cumulative distribution function of the 
noncentral Xpfo—i) distribution with noncentrality parameter 5. 

2.3. Invariance arguments. Instead of the classical tangent space pro- 
jection approach to compute A^ ,. 2 (as in [6]), we adopt an approach — 
due to [11] — that rather exploits the invariance structure of the model 
considered. This will provide a version of the efficient central sequence 
(parallel to central sequences, efficient central sequences are defined up to 
op(l)'s only) that is based on signed ranks. Here, signed ranks are defined as 
#(0) = (#1(0), . . . , Sip(0))' and n+(#) = (R+(#), ijj(0))', where S ir {&) 
is the sign of Zi r (d) = (L~ 1 (Xi — fj,)) r and is the rank of |Zj r (0)| 

among | Z\ r (i9) | , . . . , | Z nr (i?) \ . This signed-rank efficient central sequence — 
AJ?/-2> say — is given in Theorem 2.1 below (the asymptotic behavior of 
A^ j. 2 will be studied in Appendix A). 

To be able to state Theorem 2.1, we need to introduce the following nota- 
tion. Let z ^ F+{z) = (F +1 (zi), . . .,F +r {z p ))>, with F +r {t) := Pg[|Z r (tf)| < 

t] = 2(/l 00 f r (s) ds) - 1, t > 0. Based on this, define A^ /;2 := C(I P ®L~ 1 )' x 
vecTtfj, with 
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where is the Hadamard (i.e., entrywise) product of two vectors, and where 
odiag(^4) denotes the matrix obtained from A by replacing all diagonal en- 
tries with zeros. Finally, let 2luian De the collection of pdfs / G -Fuian for 
which each (ff r , r = 1, . . . ,p, is continuous and can be written as the dif- 
ference of two monotone increasing functions. We then have the following 
result (see Appendix B for a proof). 

Theorem 2.1. Fix $ = (//, (vecd° L)')' G 9 and f G Z ulan . Then, (i) 
denoting by E^j expectation under 

as n — > oo, under P^ n |; (ii) the signed-rank quantity A$ j. 2 is a version of 
the efficient central sequence at f [i.e., AJj j. 2 = A^ j. 2 + 0^2(1) as n — > 00, 

under P^ n | /. 

Would the (nonparametric) fixed-i? submodels := UgeJ 7 !^!? 1 ]} °^ * ne 

semiparametric model Ueee UgeJ 7 ^^} ™ (2-2) be invariant under a group 

of transformations Q® that generates V^, then the main result of [11] would 
show that the expectation of the original central sequence A$ t-i conditional 

upon the corresponding maximal invariant — Xm2x($), say — is a version of the 

(n) 

efficient central sequence A^ j. 2 at /: as n — > 00, under P^ j., 

(2.6) AJ >/;2 = E^[A^ j/;2 |2^(^)] + o L ,{\). 

Such an invariance structure actually exists and the relevant group 
collects all transformations 

gl:W p x ■ ■ ■ xR p ^R p x ■ ■ ■ xRP, 

(xi,...,z ft ) H- (L/i(zi(i?)) + //,..., Lh(zn(€)) + fi) t 

with Zj(t?) := L~ l {xi — /i) and /i((^i, . . . , %>)') = (hi(zi), . . . , hp(z p ))', where 
each h r , r = 1, . . . ,p, is continuous, odd, monotone increasing and fixes +00. 
It is easy to check that T 3 ! is invariant under (and is generated by) , 
and that the corresponding maximal invariant is the vector of signed ranks 

(2.7) iW(tf) = osi(tf), . . . , Rtm, Rt(m 
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Theorem 2.1(H) then follows from (2.6) and Theorem 2.1 (i) . 

Inference procedures based on AJj. 2 , unlike those (from [6]) based on 
the efficient central sequence A^ ,. 2 obtained through tangent space pro- 
jections, are measurable with respect to signed ranks, hence enjoy all nice 
properties usually associated with rank methods: robustness, ease of com- 
putation, validity without density estimation (and, for hypothesis testing, 
even distribution- freeness), etc. 

3. Hypothesis testing. We now consider the problem of testing the null 
hypothesis Ho : L = Lq against the alternative %\ : L ^ Lq, with unspecified 
underlying density g. Beyond their intrinsic interest, the resulting tests will 
play an important role in the construction of the i?-estimators of Section 4 
below, and they pave the way to testing linear hypotheses on L. 

The objective here is to define a test that is semiparametrically efficient 
at some target density /, yet that remains valid — in the sense that it meets 
asymptotically the level constraint — under a very broad class of densities 
g. As we will show, this objective is achieved by the signed-rank test — </> , 

say — that rejects H§ at asymptotic level a £ (0, 1) whenever 

where T* L j. 2 was introduced on Page 7 (an explicit expression is given below) 

and where $q = {fx 1 , (vecd° Lq)')' is based on a sequence of estimators ft that 
is locally asymptotically discrete (see Appendix A for a precise definition) 
and root-n consistent under the null. 

Possible choices for ft include (discretized versions of) the sample mean 
X : = - Yli=l ° r the transformation-retransformation componentwise me- 
dian //Med := ioMedfLp X\, . . . , Lq 1 X n ], where Med[-] returns the vector of 
univariate medians. We favor the sign estimator /iMed) since it is very much 
in line with the signed-rank tests <f> , and enjoys good robustness properties. 
However, we stress that Theorem 3.1 below, which states the asymptotic 
properties of the proposed signed-rank tests, implies that the choice of ft 
does not affect the asymptotic properties of <f> , at any g £ .F u i an . 

In order to state this theorem, we need to define 

n,f, 9 -2 ■= C(I P ® L- l )'G f , g {I p L~ l )C 
(3.2) 



:=C{I P ®L- 1 )' 



p 

v 



^2 (lsr(f,9)(e r e' r ® e s e' s ) + p rs (f,g)(e r e' s tg) e s e' r )) 
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where we let 

(3.3) 7r.{f,9):= f\ fr {F;\u))v gr (G-\u))aluK f F' 1 (u)G; 1 («) du 
and 



o Jo 



(3.4) Prs (f,g):= F^iu^G^iu)) du x ^(F^iu^Gj^u) du. 
Jo Jo 

We also let T* L j. 2 := F* L . j. 2 and Gf := Gfj, that involve 7 ra (/, /) = Jrs(f) 

(see Section 2.2) and p rs (f,f) = 1. We then have the following result (see 
Appendix B for a proof). 

Theorem 3.1. Fix / £ F uian . Then, (i) under P^ g and under P^ +n -i/2 rff ; 
with #o = (/A (vecd° £(,)')', t = (r{ , t£)' eK p x R^" 1 ) and 5 G J - ^, 

Q f ^ Xlip-l) ^d Q f 4 xJ(p_l)(72(rLo I / lfl ;2) / (r£ 0| /;2)~ 1 ri 0l / lfl ; 2 T2!), 

respectively, asn-yoo. (ii) TTie sequence of tests <f>j has asymptotic level a 

under IJ^eiRp U 9 e.F u i an {Pu £ s^' ^ e se Q uence of tests <j>j is semiparamet- 
rically efficient, still at asymptotic level a, when testing T~Lq :L = Lq against 

h{:L^L with noise density f (i.e., when testing U^rp U g eF ulan { p ^L , g } 
against U^rp U£ 6 m 1p \{l }{ P J1/})- 

The test <fi achieves semiparametric efficiency at / [Theorem 3.1 (iii)], 

and also at any f a , with f a (z) : = Ylr=l <T r 1 fr ( z r/o~r), where a r > for all 
r — it can indeed be checked that <fi = 4> f - Most importantly, Theorem 3.1 

— Jo — J 

shows also that 4> remains valid under any g £ -Fuian- By proceeding as in 
Lemma 4.2 of [19], this can even be extended to any g S T , which allows us 
to avoid any finite moment condition. 

This is to be compared to the semiparametric approach of Chen and Bickel 
[6] — these authors focus on point estimation, but their methodology also 
leads to tests that enjoy the same properties as their estimators. Their pro- 
cedures achieve uniform (in g) semiparametric efficiency, while our methods 
achieve semiparametric efficiency at the target density / only — more pre- 
cisely, at any corresponding f a . However, it turns out that the performances 
of our procedures do not depend much on the target density /, so that our 
procedures are close to achieving uniform (in g) semiparametric efficiency; 
see the simulations in the supplemental article [16]. As any uniformly semi- 
parametrically efficient procedures (see [1]), Chen and Bickel's procedures 
require estimating g, hence choosing various smoothing parameters. In con- 
trast, our procedures, by construction, are invariant (here, signed-rank) ones. 
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As such, they do not require us to estimate densities, and they are robust, 
easy to compute, etc. 

One might still object that the choice of / is quite arbitrary. This choice 
should be based on the practitioner's prior belief on the underlying densities. 
If he/she has no such prior belief, a kernel estimate / of / could be used. 
The resulting test <j> ^ would then enjoy the same properties as any eft in 
terms of validity, since kernel density estimators, in the symmetric case 
considered, typically are measurable with respect to the order statistics of 
the |Zj r (i?o)rs, that, asymptotically, are stochastically independent of the 
signed ranks Si r ('&o),R^.('&o) used in <j> ; see [11] for details. The test 4> ^ 
would further achieve uniform semiparametric efficiency 

Further results on the proposed tests are given in the supplemental article 
[16]. More precisely, a simple explicit expression of the test statistics, local 
asymptotic powers of the corresponding tests, and simulation results can be 
found there. 

We finish this section by describing the extension of our signed-rank tests 
to the problem of testing a fixed (arbitrary) linear hypothesis on L, which 
includes many instances of high practical relevance (we mentioned a few in 
the Introduction). Denoting by V(f2) the vector space that is spanned by 
the columns of the p{p — 1) x I matrix (which is assumed to have full rank 
£), we consider the testing problem 

,„ f H (L ,n) : (vecd° L) £ (vecd° L ) + V(fi) 

1 J \^i(L ,O):(vecd°L) g (vecd° L ) + V(fi), 

for some fixed Lq £ M.± p . If one forgets about the tacitly assumed constraint 
that L G M\ p in (3.5), the null hypothesis above imposes a set of linear con- 
straints on L. This clearly includes all testing problems mentioned in the 
Introduction: testing that a given column of L is equal to a fixed vector, test- 
ing that a given (off-diagonal) entry of L is zero and testing block-diagonality 
of L. 

Inspired by the tests from [18] (Section 10.9), the analog of our signed- 
rank test <\> above then rejects 'Hq{Lq,V£) for large values of 

Q f (L ,n) := (At /;2 )'P n A| /;2 

with P n := (H , J~ - n(fl'T* f ,nyn', where B~ denotes the Moore- 

Penrose pseudoinverse of B, and where i? = (/}', (vecd° L)')' is an estimator 
of d that is locally and asymptotically discrete, root-n consistent under the 
null, and constrained — in the sense that L satisfies the linear constraints in 

It can be shown that this signed-rank test achieves semiparametric opti- 
mality at / (the relevant optimality concept here is most stringency; see, e.g., 



12 



P. ILMONEN AND D. PAINDAVEINE 



[19] for a discussion) and remains valid under any g G £ u lan- Its null asymp- 
totic distribution is still chi-square, now with r := Trace[P^T* L j. 2 ] degrees of 
freedom (this directly follows from Theorem 9.2.1 in [24] and Theorem A.l); 
at asymptotic level a, the resulting asymptotic critical value (that actually 
does not depend on the true value L) therefore is Xr-i-a- ^ us * as f° r ^ ne 
tests (j) , it is still possible to compute asymptotic powers under sequences 
of local alternatives. It is clear, however, that a thorough study of the prop- 
erties of the tests above, for a general linear hypothesis, is beyond the scope 
of the present paper, hence is left for future research. In the important par- 
ticular case of testing block-diagonality of L, a complete investigation of the 
signed-rank tests can be found in [19]. 

4. Point estimation. We turn to the problem of estimating L, which is of 
primary importance for applications. Denoting by Q = Qj.(Lq) the signed- 
rank test statistic for T~Lq : L = Lq in (3.1), a natural signed-rank estimator 
of L is obtained by "inverting the corresponding test," 

i/;arg min = ar g min <2 f ( L ) • 

This estimator, however, is not satisfactory: as any signed-rank quantity, the 
objective function L \- > Q ^{L) is piecewise constant, hence discontinuous and 
nonconvex, which makes it very difficult to derive the asymptotic properties 
of i/ ;argm i n - It is also virtually impossible to compute i/ ;argm i n in practice, 
since this lack of smoothness and convexity essentially forces computing 
the estimator by simply running over a grid of possible values of the p(p — 
l)-dimensional parameter L — a strategy that cannot provide a reasonable 
approximation of j£/ ;argm i n , even for moderate values of p. Finally, there is 
no way to estimate the asymptotic covariance matrix of Lj. argmin , which 
rules out the possibility to derive confidence zones for L, hence drastically 
restricts the practical relevance of this estimator. 

In order to avoid the aforementioned drawbacks, we propose adopting a 
one-step approach that was first used in [7] for the problem of estimating 
the shape of an elliptical distribution or in [9] in a more general context. The 
resulting one-step signed-rank estimators — in the sequel, we simply speak of 
one-step rank estimators or one-step R-estimators — can easily be computed 
in practice, their asymptotic properties can be derived explicitly, and their 
asymptotic covariance matrix can be estimated consistently. 

4.1. One-step R-estimators of L. To initiate the one-step procedure, a 
preliminary estimator is needed. In the present context, we will assume 
that a root-n consistent and locally asymptotically discrete estimator $ = 
(/}', (vecd L)')' is available. As we will show, the asymptotic properties of 
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the proposed one-step i2-estimators will not be affected by the choice of t?. 
Practical choices will be provided in Section 5. 
Describing our one-step i?-estimators requires: 

Assumption (A). For all r 7^ s G {1, . . . ,p}, we dispose of sequences of 
estimators j rs (f) and p rs (f) that: (i) are locally asymptotically discrete and 
that (ii), for any g G J" u i an , satisfy J rs (f) = 7rs(/,#) + op(l) and p rs (f) = 
Prs(f,g)+Op(l) asn^oo, under Utf e e{ P i£g}- 

Sequences of estimators fulfilling this assumption will be provided in Sec- 
tion 4.2 below. At this point, just note that plugging in (3.2) the estimators 

from Assumption (A) and the preliminary estimator L, defines a statistic — 
^Z/-2' — cons i s t en tly estimates ^ g . 2 under UtfeeO 3 ^}- 

For any target density /, we propose the one-step i?-estimator Lp with 
values in A4\ p , defined by 

(4.1) vecd°L f := (vecd° L) + n~ l ' 2 {f ij.^~ X ^ f . r 

The following result states the asymptotic properties of this estimator (see 
Appendix B for a proof). 

Theorem 4.1. Let Assumption (A) hold, and fix / G ,7lulair Then (i) 
under P^, with 1? = (//, (vecd° L) 1 )' G and g G ^ulanj we have that 

(4.2) v^vec^ — L) = ("xV}. f ., r2 ) ; A;, /:2 + o P (l) 

( 4 - 3 ) =^(ri, / , g;2 )- 1 A^ 9;2 + OP (i) 

( 4 - 4 ) ^^ r p(p-i)(o,C(r5 <)/>fl . 2 )- 1 ri i/ . 2 (ri i/)fl . 2 )- ]y C7) 

as n— 7- 00, where j ff . 2 is defined in Theorem A.l (see Appendix A), (ii) 
27ie estimator Lj is semiparametrically efficient at f. 

The result in (4.2) justifies calling Lj an /^-estimator since it shows that 
n l l 2 (Lf — L) is asymptotically equivalent to a random matrix that is measur- 
able with respect to the signed ranks S^i?), Rf($) in (2.7). The asymptotic 
equivalence in (4.3) gives a Bahadur- type representation result for Lj with 
summands that are independent and identically distributed, hence leads 
trivially to the asymptotic normality result in (4.4). Recalling that T*~ 

consistently estimates Y* L ^ g . 2 under U (?e @{P^^} , it is clear that asymp- 
totic (signed-rank) confidence zones for L may easily be obtained from this 
asymptotic normality result. 
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For r 7^ s € {1, . . . ,p}, define a rs (f) and $ r s(f) as the statistics obtained 
by plugging the estimators 7 rs (/) and p rs (f) from Assumption (A) in 



(4.5) 



/. * lrs(f,g) 



Prs(f,g) ■-- 



7rs(/, g)jsr(f, g) - Prs(f, g)Psr(f, ff) 

-Prs(f,9) 

7rs(/, 5)7sr (/, 5) - Prs (/, g)p S r(f, fl 1 ) ' 



and let a rr {f) := =: f3 rr (f), r = 1, . . . ,p. The estimator then admits the 
following explicit expression (see Appendix B for a proof). 

Theorem 4.2. iei Assumption (A) hold, and fix f G^Iuian- Let Nf := 
{A'fGTjj) + (B'jQT^j), where we let A f := (&„(/)) and B/ := (&.(/)). 
?7ien £/ie estimator Lj rewrites 

(4.6) i / = L+4=-£[JV/-diag(LJV»], 

where diag(A) = A — odiag(yl) stands for the diagonal matrix with the same 
diagonal entries as A. 

It is straightforward to check that the role of the term —-^Ldiag(LNf) 

in the one-step correction -^L[Nf — diag(ZiV^)] of L is merely to ensure 

that the diagonal entries of Lj remain equal to one, hence that Lj takes 
values in Aii p (for n large enough). 

As shown above, the estimator Lj enjoys very nice properties: its asymp- 
totic behavior is completely characterized, it is semiparametrically efficient 
under correctly specified densities, yet remains root-n consistent and asymp- 
totically normal under a broad range of densities g, its asymptotic covariance 
matrix can easily be estimated consistently, etc. 

However, Lj requires estimates J rs (f) and p rs (f) that fulfill Assumption 
(A). We now provide such estimates. 



4.2. Estimation of cross-information coefficients. Of course, it is always 
possible to estimate consistently the cross- information coefficients "f r s(f,g) 
and p rs {f,g) by replacing g in (3.3) and (3.4) with appropriate window or 
kernel density estimates — this can be achieved since the residuals Zj r ($), 
i = 1, ...,n typically are asymptotically i.i.d. with density g r . Rank-based 
methods, however, intend to eliminate — through invariance arguments — the 
nuisance g without estimating it, so that density estimation methods simply 
are antinomic to the spirit of rank-based methods. 
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Therefore, we rather propose a solution that is based on ranks and avoids 
estimating the underlying nuisance g. The method, that relies on the asymp- 
totic linearity — under g — of an appropriate rank-based statistic S_$j, was 
first used in [7], where there is only one cross-information coefficient J(f,g) 
to be estimated. There, it is crucial that J(f, g) is involved as a scalar factor 
in the asymptotic covariance matrix, under g, between the rank-based effi- 
cient central sequence j and the parametric central sequence A$ iff . In [5], 
the method was extended to allow for the estimation of a cross-information 
coefficient that appears as a scalar factor in the linear term of the asymptotic 
linearity, under g, of a (possibly vector-valued) rank-based statistic S.$j- 

In all cases, thus, this method was only used to estimate a single cross- 
information coefficient that appears as a scalar factor in some structural — 
typically, cross-information — matrix. In this respect, our problem, which 
requires us to estimate 2p(p — 1) cross-information quantities appearing in 
various entries of the cross-information matrix ^ g , 2 , is much more com- 
plex. Yet, as we now show, it allows for a solution relying on the same basic 
idea of exploiting the asymptotic linearity, under g, of an appropriate /-score 
rank-based statistic. 

Based on the preliminary estimator ■& := (pf , (vecd° L) 1 ) 1 at hand, define 
0>« := (vecd° Ll rs )')', A > 0, with 

Lf' :=L + n- 1 /2A(r^ / ) rs L(e r e^ - diag(Le r e'J), 

and d p x rs := (pf, (vecd° L p { s )')', A > 0, with 

L p { s :=L + n^ 2 X(T lf ) sr L(e r e' s - diag(Le r e'J); 

note that, at A = 0, $J^ a = r & x s = We then have the following result that is 
crucial for the construction of the estimators 7 rs (/) and p rs (f); see Appendix 
B for a proof. 

Lemma 4.1. Fix d € 9, / € 2luian> 9 G -^ulan and r^s6{l,...,p}. 
Then h^ rs (A) := (2^ >/ )„(Z^-* j/ )„ = (1 - ^r S (f,g))((T^ f ) rs ) 2 + o P (l) and 
h pr3 (X) := (T^j) sr (T^prsj)sr = (1- Xp rs (f,g))((T^ f ) sr ) 2 + o P (l) asn^-oo, 
under . 

The mappings A i— > h" (ra (A) and A i— > h Prs (A) assume a positive value in A = 
0, and, as shown by Lemma 4.1, are — up to op(l)'s as n— > oo under — 
monotone decreasing functions that become negative at A = (-y r s(f, ff)) -1 
and A = (p rs (f, fiO) -1 ; respectively. Restricting to a grid of values of the form 
Xj = j/c for some large discretization constant c (which is needed to achieve 
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the required discreteness), this naturally leads — via linear interpolation — to 
the estimators J rs (f) and p rs (f) defined through 



(4.7) 



(Aj (f\\~^.— \ — \~ | {\rs \rs) h C\vJ 

( "V J - ^ 7rS (A> J 

= A~ + W 



7 " s ' ^-(A^J-^^A+J 
with A~ s := inf{j G N : h^(\ j+1 ) < 0} and A+ s := X~ rs + ±, and 

(4.8) ( ,„ (/)) -.^ V „ : ^- j + _g^^ 

with A" s := inf{j G N:/i^ s (A j+ i) < 0} and A+ s := A~ s + \. We have the 
following result (see the supplemental article [16] for a proof). 

Theorem 4.3. Fix # G G, / G £ u ian? anc ^ 5 G -^uian- Assume that ■& is 
such that, for all e > 0, i/iere exist <5 e > and an integer N £ such that 

(4-9) P^mj)rs>S £ }>l-S 

for alln>N £ , r^s£ {l,...,p}. Then, for any such r,s, J rs (f) =lrs(f,g) + 
op(l) and f>rs(f) = Prs{f, g) + op(l), as n — > oo under P^ l, hence % s {f) and 
Prs(f) satisfy Assumption (A). 

We point out that the assumption in (4.9) is extremely mild, as it only 
requires that there is no couple (r, s), r ^ s, for which (T^ j) rs asymptotically 

has an atom in zero. It therefore rules out preliminary estimators L defined 
through the (rank-based) /-likelihood equation (T$j) rs = 0. 

5. Simulations. Here we report simulation results for point estimation 
only — simulation results for hypothesis testing can be found in the supple- 
mental article [16]. Our aim is to both compare the proposed estimators 
with some competitors and to investigate the validity of asymptotic results. 

We used the following competitors: (i) FastICA from [12, 13], which is by 
far the most commonly used estimate in practice; we used here its deflation 
based version with the standard nonlinearity function pow3. (ii) FOBI from 
[4], which is one of the earliest solutions to the ICA problem and is often 
used as a benchmark estimate, (iii) The estimate based on two scatter ma- 
trices from [20] ; here the two scatter matrices used are the regular empirical 
covariance matrix (COV) and the van der Waerden rank-based estimator 
(HOP) from [7] (actually, HOP is not a scatter matrix but rather a shape 
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matrix, which is allowed in [20]). Root-n consistency of the resulting esti- 
mates Lfica, Lfobi and £cov_hop °^ ^ re q u i res finite sixth-, eighth- and 
fourth-order moments, respectively, and follows from [14, 15] and [21]. 

We focused on the bivariate case p = 2, and we generated, for three 
different setups indexed by d G {1,2,3}, M = 2,000 independent random 

samples zf' m) = {z\(' m \ Z^' m) )', i = l,...,n, of size n = 4,000. Denot- 
ing by g( d \z) = g^[\zi)g^\z2) the common pdf of z^ d ' m \ i = l,...,n, 
m = 1, . . . , M, the marginal densities gf^ and g% were chosen as follows: 

(i) In Setup d= 1, g± is the pdf of the standard normal distribution 
(A/"), and g% is the pdf of the Student distribution with 5 degrees of freedom 

(*b); 

(ii) In Setup d = 2, g± is the pdf of the logistic distribution with scale 
parameter one (log), and gy is t$; 

(iii) In Setup d = 3, g± is ts and g^ is t*,. 

We chose to use L = I2 and ll = (0,0)', so that the observations are given 
by X^ d ' m ^ = LZ^ d ' m ^ + fj,= Z^ d ' m ^ (other values of L and ll led to extremely 
similar results). 

For each sample, we computed the competing estimates Lfica ; -^fobi 
and icov_HOP defined above. Each of these were also used as a preliminary 
estimator L in the construction of three i?-estimators: L.fU) , J = 1, 2, 3, with 

= for all j. In the resulting nine -R-estimators, we used the location 
estimate fx = LMed[L~ 1 Xi, . . . , L~ 1 X n ], based on the preliminary estimate 
L used to initiate the one-step procedure. 

Figure 1 reports, for each setup d, a boxplot of the M squared errors 

(5.1) \\L(x[ d > m \ . . „lM) - Lf = j^{L rs {X^ m \. . .,X^) - L rs f 

r,s = l 

for each of the twelve estimators L considered (the nine /^-estimators and 
their three competitors). 

The results show that, in each setup, all ^-estimators dramatically im- 
prove over their competitors. The behavior of the i?-estimators does not 
much depend on the preliminary estimator L used. Optimality of Lt(d) in 
Setup d is confirmed. Most importantly, as stated for hypothesis testing 
at the end of Section 3, the performances of the i2-estimators do not de- 
pend much on the target density /W adopted, so that one should not worry 
much about the choice of the target density in practice. Quite surprisingly, 
^-estimators behave remarkably well even when based on preliminary esti- 
mators that, due to heavy tails, fail to be root-n consistent. 
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Competitors 



prehm=FICA 



prelim=FOBl 



prelim=COV_HOP 




FtCA FOBI COV HOP 



L.('*HII Uf"8» >-.!'*»)> 



L.fl'H)) LJF'CP) LJI'09 



L ..(!*( I » L_(I*B» Lj(*(3)) 



Fig. 1. Boxplots of the squared errors \\L — L\\ [see (5.1)] obtained in M = 2,000 repli- 
cations from setups d — 1,2,3 (associated with underlying distributions g^ d \ d= 1,2,3) 
for the competitors Lfica, £fobi and Lcov_hop, o,nd the nine R-estimators Lf resulting 
from all combinations of a target density = <? , j = 1,2,3, and one of the three pre- 
liminary estimators Lfica, £fobi and Lcov_hop,' see Section 5 for details. The sample 
size is n — 4,000. 



In order to investigate small-sample behavior of the estimates, we reran 
the exact same simulation with sample size n = 800; in ICA, where most ap- 
plications involve sample sizes that are not in hundreds, but much larger, this 
sample size can indeed be considered small. Results are reported in Figure 2. 
They indicate that, in Setups 2 and 3, .R-estimators still improve significantly 
over their competitors, and particularly over Lfobi and I/cov_hop- I n Setup 
1, there seem to be no improvement. Compared to results for n = 4,000, the 
behavior of one-step i?-estimators here depends more on the preliminary 
estimator used. Performances of .R-estimators again do not depend crucially 
on the target density, and optimality under correctly specified densities is 
preserved in most cases. 

As a conclusion, for practical sample sizes, the proposed .R-estimators 
outperform the standard competitors considered, and their behavior is very 
well in line with our asymptotic results. 

Finally, we illustrate the proposed method for estimating cross-information 
coefficients. We consider again the first 50 replications of our simulation 
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Fig. 2. 27ie same boxplots as in Figure 1, but based on sample size n = 800. 

with n = 4,000, and focus on Setup 1 (g = fp 1 ') and the target density 
/ = /( 3 ) (^g^)- The cross-information coefficients to be estimated then are 
J 12 (f,g) « 1-478, 721 (f,g) « 0.862, p 12 (/,<?) « 1.149 and p 2 i(/,<?) « 0.887. 
The upper left picture in Figure 3 shows 150 graphs of the mapping A h- >• 
/i 7l2 (A) (based on / = / {3) ), among which the 50 pink curves are based on 
L = Lfica, the 50 green curves are based on L = Lfobi, and the 50 blue 
ones are based on L = ^cov_hop- The upper right, bottom left and bot- 
tom right pictures of the same figure provide the corresponding graphs for 
the mappings A \-t K 121 (A), A \- > h Pl2 (A), and A \-t h P21 (A), respectively. The 
value at which each graph crosses the A-axis is the resulting estimate of the 
inverse of the associated cross- information coefficient. To be able to evaluate 
the results, we plotted, in each picture, a vertical black line at the corre- 
sponding theoretical value, namely at I/712 (/,#), I/721 (/,#), l/p\i{f,g) 
and l//02i(/, g). Clearly, the results are excellent, and there does not seem 
to be much dependence on the preliminary estimator L used. 



APPENDIX A: RANK-BASED EFFICIENT CENTRAL SEQUENCES 

In this first Appendix, we study the asymptotic behavior of the rank- 
based efficient central sequences A^^. 2 . The main result is the following 
(see Appendix B for a proof). 
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Fig. 3. Top left: 150 graphs of the mapping A h-> /i 712 (A) based on f — , associated 
with the first 50 replications from Setup 1 (g = g ' ) in Figure 1 (sample size is n — 4,000 ): 
the 50 curves in pink, green, and blue are based on the preliminary estimators Lfica, 
£fobi and icov_HOP, respectively. Top right, bottom left, and bottom right: the corre- 
sponding plots for the mappings A n- ft 721 (A), A n* h P12 (A) and A i— » h P21 (A), respectively. 



Theorem A.l. Fix § = (//, (vecd° L)')' G 6 and f G Zuian- Then, (i) 
for any g^T, 

as ra— »oo, underP^ g , where AJj 9;2 := C(I P ®L~ 1 )' vec[odiag(^ Ya=i{^ © 
v3/ (F- 1 (G + (|Z l |))))(^0F+ 1 (G + (|Z l |))y)]. (ii) UnderP { ;l n _ 1/2r>g! Wl thr = 
(T^r^)' G R p x and 5 G .F ulanj 

as n — > oo (^/or r = 0, i/ie result only requires that g £ J-). (iii) S'iz// u>ii/i r = 
(r{,r^ G M p x JR^" 1 ) and g G .F ulan , A^ +n _ 1/2ri/;2 - A^ /;2 = -II^.^ + 

op(l) as ?i — > oo, under ~Pq. 
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Both for hypothesis testing and point estimation, we had to replace in 
the parameter d with some estimator say). The asymptotic 

behavior of the resulting (so-called aligned) rank-based efficient central se- 
quence A^ (n) , 2 is given in the following result. 

Corollary A.l. Fix # = (//, (vecd° L)')' e@, f e Z u ian> anrf 5 e -^uian- 
Lei d = $^ n ) = (//', (vecd° L)')' be a locally asymptotically discrete sequence 
of random vectors satisfying re 1//2 (i9 — d) = Op(l) as n ^ oo, under P,^- 
TTien 2 — A^j. 2 = — ^ g -2 nl vecd°(L — L) + op(l), st«ZZ as n ^ oo, 
under . 

Since the sequence of estimators $( n ) is assumed to be locally asymptot- 
ically discrete [which means that the number of possible values of i?( n ) in 
balls with <9(n -1 / 2 ) radius centered at i9 is bounded as n— > oo], this result 
is a direct consequence of Theorem A.l(iii) and Lemma 4.4 from [17]. Local 
asymptotic discreteness is a concept that goes back to Le Cam and is quite 
standard in one-step estimation; see, for example, [2] or [17]. 

Of course, a sequence of estimators $( n ) can always be discretized by 
replacing each component ('&^)e with 

(^; ] ) e := (cn 1 / 2 )- 1 sign((#™)),) [cn 1 / 2 ^)^] , £ = 1, . . . ,p 2 , 

for some arbitrary constant c > 0. In practice, however, one can safely forget 
about such discretizations: irrespective of the accuracy of the computer used, 
the discretization constant c can always be chosen large enough to make 
discretization be irrelevant at the fixed sample size no at hand — hence also 
at any n > uq . 

APPENDIX B: PROOFS 

B.l. Proofs of Theorems 2.1 and A.l. The proofs of this section make 
use of the Hajek projection theorem for linear signed-rank statistics (see, 
e.g., [23], Chapter 3), which states that, if Yi = Sign(li)|l^|, i = 1, . . . , n, are 
i.i.d. with (absolutely continuous) cdf G and if K : (0, 1) — > R is a continuous 
and square-integrable score function that can be written as the difference of 
two monotone increasing functions, then 

1 n 

-^^si g n(y,)^(G + (|y i |)) 
vn i=i 

(B.l) = -= £ Sign(Y^ (-L- + o L2 (1) 

V n t— f \n + lj 
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(B.2) 



n 

-^J^SignCrOE^CG+dliDJW + o^Cl) 



as n — )• oo, where G+ stands for the common cdf of the \ Y{ | 's and .R^" denotes 
the rank of \Yi\ among \Y\\, ■ ■ ■ , \Y n \. The quantities in (B.l) and (B.2) are 
linear signed-rank quantities that are said to be based on approximate and 
exact scores, respectively. 

In the rest of this section, we fix § € 0, / £ 2luian> an d gGJ. We write 
throughout Zj, and Rf , for Z^fi), Si{d), and Rf($), respectively. We 

also write instead of E^ h , with h = f,g. We then start with the proof of 
Theorem A.l(i). 

Proof of Theorem A.l(i). Fix r ^ s £ {1, . . . ,p} and two score func- 
tions K a , Kb : (0, 1) — > K with the same properties as K above. Then, by 
using (i) E 9 [5j r ] =0, (ii) the independence (under Pj^) between the Sir's 
and the (Ri r , |Zj r |)'s, and (iii) the independence between the Z^s and the 
Zis's, we obtain 



E„ 



— r= Si r Si s { Ka (G+r (| Zi r |)) K b (G +s ( | Zfc 



I)) 



A„ 



i? 



n + 1 



A7 



n + l 



n 



i=l 



A a (G +r .(|Z ir |))A fe (G +s (|Z is |)) 



A a 



A 



n + l 



<2E n 



+ 2E a 



A a (G +r (|Z ir |))-A a 



Rt 



R 



n+l 

2 



Ki 



n + l 



E, 



n+l 

A b (G +s (|Z ls |))-A fe 



E g [Ki(G +s (\Z is \))] 

+ \ \ 2 



i?7 



n + l 



Consequently, the square integrability of A a , K b , and the convergence to 
zero of both E g [{K a (G +r (\Z ir \)) - K a {^)) 2 ] and E g [(K b (G +r {\Z is \)) - 



Ab(^y)) ] [which directly follows from (B.l)] entail 
1 - 

— f= / v 5'ir5'isA a (G +r (|Z ir |))Ab(G_|_ s (|Zj, 
V n ^ 



)) 
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as n — 7- oo, under ^ . Theorem A.l(i) follows by taking fT a = ipf r o F +r and 
lf 6 = F+ a 1 . □ 

We go on with the proof of Theorem 2.1, for which it is important to note 
that, by proceeding as in the proof of Theorem A.l(i) but with (B.2) instead 
of (B.l), we further obtain that 



1 n 

—p= / Si r Si S K a {Gj rT {\Zi T \))Kf :) {Gj rS 



[\Zis\)) 



(B.3) 



Vn + l/ Vra + l/ 

1 " 

v n i= l 

x E g [K b (G +s (\Z is \))\R +is ] + o L2 (l), 

still as n — V oo under ^ . 

Proof of Theorem 2.1. It is sufficient to prove Theorem 2.1(i) only, 
since, as already mentioned at the end of Section 2.3, Theorem 2.1(h) follows 
from (2.6) and Theorem 2.1 (i) . That is, we have to show that, for any r,s £ 

{1, ■■-,?}, 



E 



(B.4) 



1 n 

v n i=i 



as n — > oo, under P^l. Now, the left-hand side of (B.4) rewrites 
E / Y&fWi ~ ^rJSi,. ..,S n ,R+,..., R+ 
1 n 

(B.5) = —= y Ef [SirSigip f(\Zi r \)\Zi S \ — 5 rs \Si, . . . , S n ,R l} . . - ,R n ] 
v i=i 

1 n 

= —7= y~)(Sir Sj s Ef [ipf (\Zj r \)\Zi s \\ R~i r , . . .,R+ r ,Rf s , . . .,Rn S ] - S rs ). 
v 1=1 
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For r ^ s, this yields 



1 

~r ^2^f( Zi ) z 'i ~ h)rs\^- ■■>S n , Rf, ■ ■ ■ , Rn 

vn i=l 
1 n 

= Sj r Si S Ef[(pf(\Z ir \)\Kt r , . . . , R+ r ]Ef[\Z is \\Rf s , . . .,R} : 

V " ■ i 
1=1 

i=l x x 



R is 



+ L 2(1) 



as n — > oo, under P# n |, where we have used (B.3), still with K a = tpf r o F^ 1 
and = F+g, but this time at g = f. This establishes (B.4) for r ^ s. As for 
r = s, (B.5) now entails [writing K a b(u) := ipf(F^(u)) x F^jT r (it) for all it] 



E, 



(B.6) 



1 n 

~~T y^XffiZdZ'i ~ Ip)rs\Sl>- ■ ■ ' S n,Rf, ■ ■ ■ , Rn 
Vn i=l 

= ^EE / [^ / (|Z ir |)|Z ir ||fi+,..., J R+^ - 

1 n 

-=^^ o6 (F +r (|Z ir |))K,...,^ 
Vn i=i 
1 n / /?+ \ 

Vn f-f V n + 1 / 

i=i 

1 n / i \ 
tZ^I - T7 - v^ + o^l) 
V« t— ^ V n + 1 y 

t=l v ' 



n 



n 



(B.7) 
(B.8) 



n K ah {u)du- ^Jn + o L 2{l) 
Jo 

OLa(l), 



still as rt — > oo, under P^ n j, where (B.6), (B.7) and (B.8) follow from the 
Hajek projection theorem for linear rank (not signed-rank) statistics (see, 
e.g., [23], Chapter 2), the square-integrability of K a b(-) (see the proof of 
Proposition 3.2(i) in [10]), and integration by parts, respectively. This fur- 
ther proves (B.4) for r = s, hence also the result. □ 



Proof of Theorem A . 1 (ii) and (iii) . (ii) In view of Theorem A. 1 (i) , it 
is sufficient to show that both asymptotic normality results hold for ^ 2 . 
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(n) 

The result under then straightforwardly follows from the multivariate 
CLT. As for the result under local alternatives [which, just as the result in 
part (iii), requires that g € -7-" u i an ], it is obtained as usual, by establishing 
the joint normality under Pj^ of log(cff , ^ n _ 1/2 ^/dP^) and A^ / 9;2 , then 
applying Le Cam's third lemma; the required joint normality follows from 
a routine application of the classical Cramer-Wold device, (iii) The proof, 
that is long and tedious, is also a quite trivial adaptation of the proof of 
Proposition A.l in [7]. We therefore omit it. □ 

B.2. Proof of Theorem 3.1. (i) Applying Corollary A.l, with $ := 
tf = (/}', (vecd° L )'y and ■& := tf = (//, (vecd° L Q )')' , entails that A\ = 

AJ o j. 2 + op(l) as n — > oo under . Consequently, we have that 
(B.9) Q f = (vecAJ 0i/;2 )'(r^ i/ . 2 )- 1 (v e c AS 0i/;2 ) + o P (l), 

still as n— >oo, under Pi"^„ — hence also under P„ , (from contigu- 

ity). The result then follows from Theorem A.l(ii). (ii) It directly follows 

from (i) that, under the sequence of local alternatives P^ +n -i/2 T j) 4>f^ nas 

asymptotic power 1 - %( p -i)(x% t ( p _ 1 ) )1 _ a ; T 2 r L ,/;2 T 2)- Tnis establishes the 
result, since these local powers coincide with the semiparametrically optimal 
(at /) powers in (2.5). 

B.3. Proofs of Lemma 4.1, Theorems 4.1 and 4.2. 

Proof of Theorem 4.1. (i) Fix tJe6 and g G J^an- From (4.1), the 
fact that r~ — P^ j ff . 2 = op(l) as n — > oo under P# n ], and Corollary A.l, 
we obtain 

yWcd ^ - L) = V^vecd°(Z - L) + (f l f . 2 )~ l ^\ f . 2 

= V^vecd°(Z — L) + (Tlj^)- 1 ^^ + o P (l) 

(B.io) =(r* Li/ig;2 )- 1 A^ /;2 + OP (i) 

as n — t- oo under P^- Consequently, Theorem A.l(i) and (ii) entails that, 
still as n — > oo under P^ ^ , 

y/nvecd° (L f — L) 

(B.ll) 

= (rL, /i9 ;2)" lA w + op(i) 

(B.12) ^^xjCO,^^)- 1 ^.^^)- 17 ). 
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Now, by using the fact that C'(yecd° H) = (vecH) for any p x p ma- 
trix H with only zero diagonal entries, we have that y^nvec^Lf — L) = 
y/nC \ec&° (Lj — L), so that (4.2), (4.3) and (4.4) follow from (B.10), (B.ll) 
and (B.12), respectively. 

(ii) The asymptotic covariance matrix of \fri vecd°(Lj — L), under P^j-, 
reduces to (T* L ^•. 2 )~ 1 [let g = f in (B.12)], which establishes the result. □ 

To prove Theorem 4.2, we will need the following result. 

Lemma B.l. Fix i? = (//, (vecd° L)')' £ 6 and f,g£ -F u i an . Then 
(I^L-^C'iTlj^Cil^L- 1 )' 
v 

= ^2 { a rs{f,9){e r e' r ® (L% s e r e' r + e s e' s - L rs e r e' s - L rs e s e' r )) 

r,s=l,r^s 

+ Prs(f,g)(e r e' s ® (L rs L sr e r e' s - L rs e r e' r - L sr e s e' s + e s e^.))}, 
where L rs denotes the entry (r,s) of L. 

Proof of Theorem 4.2. By using again the fact that C'(vecd° H) = 
(vec H ) for any p x p matrix H with only zero diagonal entries, and then 
Lemma B.l, we obtain 

vec(Lj — L) 

= C"vecd°(L / -L) 
1 



^C'(fl f . 2 r'C(I p ®L^Yve C T df 



(I P ®L) 
n 



^ {ars(/)(er4 (L^e r e' r + e s e' s - L rs e r e' s - L rs e s e r )) 

r,s=l,r^s 

+ /3rs(/)( e r e s ® (L rs L sr e r e s — L rs e r e r 

— L sr e s e s + e s e r ))} 



x vecT^. 
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Since all diagonal entries of T# j are zeros, we have that 
vec(Lj — L) 



(B.13) 



-L(I P ®L) 



{a rs (f)(e r e' r <8> (e s e' s - L rs e r e' s )) 

.r,s=l,r^s 

+ /3 rs (f)(e r e' s ® (e s e' r - L rs e r e' r ))} 



The identity (C (g) A) (vec B) = vec(ABC) then yields 

p 

vec(-L f — L) = —=(l n <g> L) vec 



iL f -L) = -=(i p ®Ly 

Hence, we have 



^ (iV/ ) sr (e s e' r - L rs e r e' r ) 

r,s=l,r^s 



L f -L 



1 _ p 

—=L S~] (N f ) sr (e s e' r - L rs e r e' r ) 

r,s=l,r^=s 
I _ P 

—=L V] {Nf) sr (e s e! r - L rs e r e' r ) 

r,s=l 

4=£(#/ - diag(LJV»), 



which proves the result. □ 

Proof of Lemma 4.1. In this proof, all stochastic convergences are 
as n — > oo under Pj^- First note that, if $ := (/!', (vecd° £)')' is an arbi- 
trary locally asymptotically discrete root-n consistent estimator for $ = 
(//, (vecd°L)')', we then have that 



(B.14) 



vec(T^ j/ - T_qj) = -G S)9 {I P ® L-^CT'Vn vecd°(L - L) 



+ op(l) 
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(compare with Corollary A.l). Incidentally, note that (B.14) implies that 
vecT^y is Op(l) [by proceeding exactly as in the proof of Theorem A.l(i) 

and (ii), we can indeed show that, under P^ n ], vecT#j is asymptotically 
multinormal, hence stochastically bounded]. 
Now, from (B.14), we obtain 

Vec(T^7rs j — ILffj) 

= -G f!g (I p ® L^C" vWcd ^™ - L) + o P (l) 

= -KX^j)rsGf l9 {Ip ® L~ 1 )C' vecd° (Le r e' s - Ldiag(Le r e' s )) + o P (l), 

which, by using the fact that C"(vecd° H ) = (vec H) for any px p matrix H 
with only zero diagonal entries, leads to 

= —H£j>,f)rsGf,g(Ip<g> L~ 1 )vec(Le r e' s - Ldiag(Le r e' s )) +op(l) 
= -A(2^j)r S G/ i9 vec(e r e' s - diag(Le r e'J) + o P (l). 
This yields 

= -HT#j)rsG ft gvec(e r e' s ) + o P (l) 
= -A(T^ / ) rs (7 rs (/,5f)vec(e r e / s ) + p rs (f, g)vec(e s e' r )) 
+ op(l). 

Premultiplying by (T# f)rs( e s ® e r)') we then obtain 

(Lj}j)rs(Tj}Jr a j) rs - (CL$j)rs) 2 = -K(2Lffj)rs) lrs(f,9) +Op(l) 

[recall indeed that ^ = O p (1)], which establishes the 7-part of the lemma. 
The proof of the /O-part follows along the exact same lines, but for the fact 
that the premultiplication is by j) sr {e T ® e s )' . □ 
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SUPPLEMENTARY MATERIAL 

Further results on tests and a proof of Theorem 4.3 

(DOI: 10.1214/11-AOS906SUPP; .pdf). This supplement provides a simple 
explicit expression for the proposed test statistics, derives local asymptotic 
powers of the corresponding tests, and presents simulation results for hy- 
pothesis testing. It also gives a proof of Theorem 4.3. 
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